25 research outputs found

    KGI: An Integrated Framework for Knowledge Intensive Language Tasks

    Full text link
    In a recent work, we presented a novel state-of-the-art approach to zero-shot slot filling that extends dense passage retrieval with hard negatives and robust training procedures for retrieval augmented generation models. In this paper, we propose a system based on an enhanced version of this approach where we train task specific models for other knowledge intensive language tasks, such as open domain question answering (QA), dialogue and fact checking. Our system achieves results comparable to the best models in the KILT leaderboards. Moreover, given a user query, we show how the output from these different models can be combined to cross-examine each other. Particularly, we show how accuracy in dialogue can be improved using the QA model. A short video demonstrating the system is available here - \url{https://ibm.box.com/v/kgi-interactive-demo}

    Hypernym Detection Using Strict Partial Order Networks

    Full text link
    This paper introduces Strict Partial Order Networks (SPON), a novel neural network architecture designed to enforce asymmetry and transitive properties as soft constraints. We apply it to induce hypernymy relations by training with is-a pairs. We also present an augmented variant of SPON that can generalize type information learned for in-vocabulary terms to previously unseen ones. An extensive evaluation over eleven benchmarks across different tasks shows that SPON consistently either outperforms or attains the state of the art on all but one of these benchmarks.Comment: 8 page

    Oxidizability assay of unfractionated plasma of patients’ with different plasma profile: a methodological study

    Get PDF
    BACKGROUND: Present study describe the in vitro model of plasma oxidation of patients with different lipid profile, that can be correlated to their invivo plasma oxidizability in order to find the arterial diseases prone patient groups. METHOD: The method applied here to measure the invitro plasma oxidizability, accounts a convenient way that can be well suited in any clinical laboratory settings. Un-fractionated plasma was exposed to CuSO4 (5.0 mmol/L), a pro-oxidant, and low frequency ultrasonic wave to induce oxidation, and finally oxidizability was calculated by TBARS and Conjugated Diene methods. RESULT: In our study, plasma LDL greater than 150 mg/dL possess 1.75 times more risk to undergo oxidation (CI, 0.7774 to 3.94; p = 0.071) than the low LDL plasma, percent of oxidation increased from 38.3% to 67.1% for the LDL level upto 150 mg/dL and high. Lag phase, which is considered as the plasma antioxidative protection, was also influenced by the higher LDL concentration. The mean lag time was 65.27 ± 20.02 (p = 0.02 compared to healthy), where as for 94.71 ± 35.11 min for the normolipidemic subject. The plasma oxidizability was also changed drastically for total cholesterol level, oxidative susceptibility shown 35% and 55.02% for 200 mg/dL and high respectively, however it didn’t appear as risk factor. Patient samples were also stratified according to their age, gender, and blood glucose level. Older persons (≥40 years) were 1.096 times (95% CL, 0.5607 to 2.141, p = 0.396) than younger (≤39 years age), males are 1.071 (95% CI, 0.5072- 2.264) times than the females, and diabetic patients are 1.091 (CI, 0.6153 to 1.934, p = 0.391) times in more risk than the non-diabetic counterpart. CONCLUSION: This method addressing its easy applicability in biomedical research. And by this we were able to show that patients with high LDL (≥150 mg/dL) are in alarming condition besides diabetic and elderly (≥40 years age) males are considered to be susceptible and more prone to develop vascular diseases

    Taxonomy Construction of Unseen Domains via Graph-based Cross-Domain Knowledge Transfer

    Get PDF
    Extracting lexico-semantic relations as graph-structured taxonomies, also known as taxonomy construction, has been beneficial in a variety of NLP applications. Recently Graph Neural Network (GNN) has shown to be powerful in successfully tackling many tasks. However, there has been no attempt to exploit GNN to create taxonomies. In this paper, we propose Graph2Taxo, a GNN-based cross-domain transfer framework for the taxonomy construction task. Our main contribution is to learn the latent features of taxonomy construction from existing domains to guide the structure learning of an unseen domain. We also propose a novel method of directed acyclic graph (DAG) generation for taxonomy construction. Specifically, our proposed Graph2Taxo uses a noisy graph constructed from automatically extracted noisy hyponym hypernym candidate pairs, and a set of taxonomies for some known domains for training. The learned model is then used to generate taxonomy for a new unknown domain given a set of terms for that domain. Experiments on benchmark datasets from science and environment domains show that our approach attains significant improvements correspondingly over the state of the art

    Adaptation of LIMSI's QALC for QA4MRE.

    Get PDF
    International audienceIn this paper, we present LIMSI participation to one of the pilot tasks of QA4MRE at CLEF 2012: Machine Reading of Biomedical Texts about Alzheimer. For this exercise, we adapted an existing question answering (QA) system, QALC, by searching answers in the reading document. This basic version was used for the evaluation and obtains 0.2, which was increased to 0.325 after basic corrections. We developed then different methods for choosing an answer, based on the expected answer type and the question plus answer rewritten to form hypothesis compared with candidates sentences. We also conducted studies on relation extraction by using an existing system. The last version of our system obtains 0.375

    Assessment of NER solutions against the first and second CALBC Silver Standard Corpus

    Get PDF
    Background Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions. Results All four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I. The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants’ solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE. Conclusions The SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs’ annotation solutions in comparison to the SSC-I

    Improving the Effectiveness of Information Extraction from Biomedical Text

    Get PDF
    Information extraction (IE) is the task which aims at automatically extracting specific target information from texts by means of various natural language processing (NLP) and Machine Learning (ML) techniques. The huge amount of available biomedical and clinical texts is an important source of undiscovered knowledge and an interesting domain where IE techniques can be applied. Although there has been a considerable amount of work for IE on other genres of text (such as newspaper articles), results of the state-of-the-art approaches for some of the IE tasks show there is still the need of improvement. Moreover, when these IE approaches are directly applied on biomedical/clinical data, the performance drops considerably. Customization of the IE approaches with biomedical/clinical genre specific features and pre/post-processing techniques does improve the results (with respect to applying the approaches directly) but the situation is still not completely satisfactory. There are many ways to accomplish this goal (e.g. exploitation of scope of negations, discourse structure, semantic roles, etc) which are yet to be fully harnessed for the improvement of IE systems. Additional challenges come from the usage of machine learning (ML) techniques themselves. Imbalance in data distribution is quite common in many NLP (including IE) tasks. Previous studies have empirically shown that unbalanced datasets lead to poor performance for the minority class. In this PhD research, we aim to address the open issues outlined above. We focus on three core IE tasks which are crucial for text mining: named entity recognition (NER), coreference resolution (CoRef), and relation extraction (RE). For NER, we propose an approach for the recognition of disease entity mentions which achieves state-of-the-art performance and is later exploited as a component in our RE system. Our NER system achieves results on par with the state of the art also for other bio-entity types such as genes/proteins, species and drugs. Since the creation of manually annotated training data is a costly process, we also investigate the practical usability of automatically annotated corpora for NER and propose how to automatically improve the quality of such corpora. CoRef, which is naturally the next step after NER, is often deemed as one of the stumbling blocs for other IE tasks such as RE. We propose a greedy and constrained CoRef approach that achieves high results in clinical texts for each individual entity mention type and for each of the four different evaluation metrics usually computed for assessing systems' performance. As for RE, one of the fundamental characteristics of our approach is that we propose to exploit other NLP areas such as scope of negations, elementary discourse units and semantic roles. We propose a novel hybrid kernel that not only takes advantage of different types of information (syntactic, semantic, contextual, etc) but also of the different ways they can be represented (i.e. flat structure, tree, graph). Our approach yields significantly better results than the previous state-of-the-art approaches for drug-drug interaction and protein-protein interaction extraction tasks. In each of the above tasks, we concentrate to develop pro-active IE approaches to automatically get rid of unnecessary training/test instances even before training ML models and using those models on test data. This enables better performance because of the reduction of less skewed data distribution as well as faster runtime. We tested our NER and RE approaches on other genres of text such as newspaper articles and automatically transcribed broadcast news. The results show that our approaches are largely domain independent
    corecore